An annotated corpus with nanomedicine and pharmacokinetic parameters

نویسندگان

Nastassja A Lewinski

Ivan Jimenez

Bridget T McInnes

چکیده

A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration's Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارایه یک پیکره‌ پرسش و پاسخ مذهبی در زبان فارسی

Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Using Natural Language Processing to Extract Drug-Drug Interaction Information from Package Inserts

The package insert (aka drug product label) is the only publicly-available source of information on drug-drug interactions (DDIs) for some drugs, especially newer ones. Thus, an automated method for identifying DDIs in drug package inserts would be a potentially important complement to methods for identifying DDIs from other sources such as the scientific literature. To develop such an algorith...

متن کامل

Boosting drug named entity recognition using an aggregate classifier

OBJECTIVE Drug named entity recognition (NER) is a critical step for complex biomedical NLP tasks such as the extraction of pharmacogenomic, pharmacodynamic and pharmacokinetic parameters. Large quantities of high quality training data are almost always a prerequisite for employing supervised machine-learning techniques to achieve high classification performance. However, the human labour neede...

متن کامل

Determination of Isosorbide Dinitrate in Serum by Gas Chromatography with New Generation of Electron Capture Detector and its Application in Pharmacokinetic Study

Isosorbide dinitrate (ISDN) is an effective drug in treatment of angina pectoris. In this study a new generation of electron capture detector (non-radioactive) with a short, non-polar and wide-bore column was used for analysis of ISDN in human serum. ISDN was extracted from serum by a mixture of ether and ethyl acetate and concentrated at room temperature. The method was linear between 5-50...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 12 شماره

صفحات -

تاریخ انتشار 2017

An annotated corpus with nanomedicine and pharmacokinetic parameters

نویسندگان

چکیده

منابع مشابه

ارایه یک پیکره‌ پرسش و پاسخ مذهبی در زبان فارسی

Corpus based coreference resolution for Farsi text

Using Natural Language Processing to Extract Drug-Drug Interaction Information from Package Inserts

Boosting drug named entity recognition using an aggregate classifier

Determination of Isosorbide Dinitrate in Serum by Gas Chromatography with New Generation of Electron Capture Detector and its Application in Pharmacokinetic Study

عنوان ژورنال:

اشتراک گذاری